Cleanup MTP async data consumer by Youssef1313 · Pull Request #7423 · microsoft/testfx

Youssef1313 · 2026-02-17T21:20:49Z

This refactors the logic to avoid the arbitrary Task.Delay calls in DrainDataAsync, and cleans up the logic.

Copilot

Pull request overview

This pull request refactors the asynchronous data consumer logic in the Microsoft Testing Platform to eliminate arbitrary Task.Delay calls during the drain operation. The changes simplify DrainDataAsync by replacing the retry-with-backoff mechanism with a more deterministic approach: completing the current channel, waiting for all messages to be consumed, then creating a fresh channel for any subsequent messages.

Changes:

Removed loop detection logic from DrainDataAsync that previously detected and threw exceptions for publisher/consumer loops
Changed DrainDataAsync from tracking message counts with delays to a simpler complete-and-restart approach
Removed IEnvironment dependency and TESTINGPLATFORM_MESSAGEBUS_DRAINDATA_ATTEMPTS environment variable
Changed channel and consume task fields from readonly to mutable to support channel recreation

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/Platform/Microsoft.Testing.Platform/Messages/IAsyncConsumerDataProcessor.cs	Changed `DrainDataAsync` return type from `Task<long>` to `Task`, removing message count tracking
src/Platform/Microsoft.Testing.Platform/Messages/AsynchronousMessageBus.cs	Removed `IEnvironment` parameter, loop detection logic, and simplified `DrainDataAsync` to iterate through processors
src/Platform/Microsoft.Testing.Platform/Messages/AsyncConsumerDataProcessor.net.cs	Refactored `DrainDataAsync` to complete channel and restart with fresh channel; removed message counting and error tracking logic; made channel and task fields mutable
src/Platform/Microsoft.Testing.Platform/Messages/AsyncConsumerDataProcessor.netstandard.cs	Refactored `DrainDataAsync` to complete channel and restart with fresh channel; removed message counting and error tracking logic; made channel and task fields mutable
src/Platform/Microsoft.Testing.Platform/Hosts/TestHostControllersTestHost.cs	Removed `IEnvironment` parameter from `AsynchronousMessageBus` constructor call
src/Platform/Microsoft.Testing.Platform/Hosts/TestHostBuilder.cs	Removed `IEnvironment` parameter from `AsynchronousMessageBus` constructor call
src/Platform/Microsoft.Testing.Platform/Helpers/EnvironmentVariableConstants.cs	Removed `TESTINGPLATFORM_MESSAGEBUS_DRAINDATA_ATTEMPTS` constant as it's no longer used
test/UnitTests/Microsoft.Testing.Platform.UnitTests/Messages/AsynchronousMessageBusTests.cs	Updated test constructor calls to remove `IEnvironment` parameter and use `var` for type inference

Copilot · 2026-02-20T13:06:30Z

    public async Task DrainDataAsync_Loop_ShouldFail()
    {
        using MessageBusProxy proxy = new();
        LoopConsumerA consumerA = new(proxy);
        ConsumerB consumerB = new(proxy);
-        AsynchronousMessageBus asynchronousMessageBus = new(
+        var asynchronousMessageBus = new AsynchronousMessageBus(
            [consumerA, consumerB],
            new CTRLPlusCCancellationTokenSource(),
            new SystemTask(),
-            new NopLoggerFactory(),
-            new SystemEnvironment());
+            new NopLoggerFactory());
        await asynchronousMessageBus.InitAsync();
        proxy.SetBuiltMessageBus(asynchronousMessageBus);

        await proxy.PublishAsync(consumerA, new LoopDataA());

        InvalidOperationException ex = await Assert.ThrowsExactlyAsync<InvalidOperationException>(asynchronousMessageBus.DrainDataAsync);
        Assert.Contains("Publisher/Consumer loop detected during the drain after", ex.Message);

        // Prevent loop to continue
        consumerA.StopConsume();
        consumerB.StopConsume();
    }


This test expects DrainDataAsync to throw an InvalidOperationException when a publisher/consumer loop is detected. However, the refactored DrainDataAsync implementation no longer includes loop detection logic. The new implementation simply completes the channel, waits for the consume task, and restarts with a fresh channel. This test will now fail or hang indefinitely as the loop will continue after each drain cycle.

Copilot · 2026-02-20T13:06:30Z

+    public async Task DrainDataAsync()
    {
-        // We go volatile because we race with Interlocked.Increment in PublishAsync
-        long totalPayloadProcessed = Volatile.Read(ref _totalPayloadProcessed);
-        long totalPayloadReceived = Volatile.Read(ref _totalPayloadReceived);
-        const int minDelayTimeMs = 25;
-        int currentDelayTimeMs = minDelayTimeMs;
-        while (Interlocked.CompareExchange(ref _totalPayloadReceived, totalPayloadReceived, totalPayloadProcessed) != totalPayloadProcessed)
-        {
-            // When we cancel we throw inside ConsumeAsync and we won't drain anymore any data
-            if (_cancellationToken.IsCancellationRequested)
-            {
-                break;
-            }
-
-            await _task.Delay(currentDelayTimeMs).ConfigureAwait(false);
-            currentDelayTimeMs = Math.Min(currentDelayTimeMs + minDelayTimeMs, 200);
-
-            if (_consumerState.Task.IsFaulted)
-            {
-                // Rethrow the exception
-                await _consumerState.Task.ConfigureAwait(false);
-            }
-
-            // Wait for the consumer to complete the current enqueued items
-            totalPayloadProcessed = Volatile.Read(ref _totalPayloadProcessed);
-            totalPayloadReceived = Volatile.Read(ref _totalPayloadReceived);
-        }
-
-        // It' possible that we fail and we have consumed the item
-        if (_consumerState.Task.IsFaulted)
-        {
-            // Rethrow the exception
-            await _consumerState.Task.ConfigureAwait(false);
-        }
+        _channel.Complete();
+        await _consumeTask.ConfigureAwait(false);

-        return _totalPayloadReceived;
+        _channel = new();
+        _consumeTask = _task.Run(ConsumeAsync, _cancellationToken);
    }


Critical race condition: When DrainDataAsync completes the channel (line 74) and before creating a new channel (line 77), any concurrent calls to PublishAsync will throw InvalidOperationException when trying to write to the completed channel. This is problematic because DrainDataAsync is called at multiple synchronization points during normal execution (see CommonTestHost.cs lines 223, 229, 245, 249), not just during shutdown. The old implementation avoided this by not completing the channel during drain. Consider using a lock or other synchronization mechanism to atomically swap the old channel with a new one, or ensure no publishing can occur during drain.

Copilot · 2026-02-20T13:06:31Z

+    public async Task DrainDataAsync()
    {
-        // We go volatile because we race with Interlocked.Increment in PublishAsync
-        long totalPayloadProcessed = Volatile.Read(ref _totalPayloadProcessed);
-        long totalPayloadReceived = Volatile.Read(ref _totalPayloadReceived);
-        const int minDelayTimeMs = 25;
-        int currentDelayTimeMs = minDelayTimeMs;
-        while (Interlocked.CompareExchange(ref _totalPayloadReceived, totalPayloadReceived, totalPayloadProcessed) != totalPayloadProcessed)
-        {
-            // When we cancel we throw inside ConsumeAsync and we won't drain anymore any data
-            if (_cancellationToken.IsCancellationRequested)
-            {
-                break;
-            }
-
-            await _task.Delay(currentDelayTimeMs).ConfigureAwait(false);
-            currentDelayTimeMs = Math.Min(currentDelayTimeMs + minDelayTimeMs, 200);
-
-            if (_consumerState.Task.IsFaulted)
-            {
-                // Rethrow the exception
-                await _consumerState.Task.ConfigureAwait(false);
-            }
-
-            // Wait for the consumer to complete the current enqueued items
-            totalPayloadProcessed = Volatile.Read(ref _totalPayloadProcessed);
-            totalPayloadReceived = Volatile.Read(ref _totalPayloadReceived);
-        }
-
-        // It' possible that we fail and we have consumed the item
-        if (_consumerState.Task.IsFaulted)
-        {
-            // Rethrow the exception
-            await _consumerState.Task.ConfigureAwait(false);
-        }
+        _channel.Writer.Complete();
+        await _consumeTask.ConfigureAwait(false);

-        return _totalPayloadReceived;
+        _channel = CreateChannel();
+        _consumeTask = _task.Run(ConsumeAsync, _cancellationToken);
    }


Critical race condition: When DrainDataAsync completes the channel writer (line 72) and before creating a new channel (line 75), any concurrent calls to PublishAsync will throw ChannelClosedException when trying to write to the completed channel. This is problematic because DrainDataAsync is called at multiple synchronization points during normal execution (see CommonTestHost.cs lines 223, 229, 245, 249), not just during shutdown. The old implementation avoided this by not completing the channel during drain. Consider using a lock or other synchronization mechanism to atomically swap the old channel with a new one, or ensure no publishing can occur during drain.

Resolve merge conflicts caused by the TestHostBuilder split (PR #8201) by applying the IEnvironment removal in TestHostBuilder.Framework.cs where the AsynchronousMessageBus is now constructed. Rework AsyncConsumerDataProcessor.DrainDataAsync to use a sentinel drain marker queued through the existing channel instead of completing and recreating the channel. This addresses both Copilot review comments: - No more channel-close race with concurrent PublishAsync calls because the channel is never completed during a drain. - Publisher/consumer loops are still detected: AsynchronousMessageBus loops through distinct processors until none of them processed any payload (up to MaxDrainAttempts = 5) and throws InvalidOperationException with the legacy 'Publisher/Consumer loop detected during the drain after ...' message otherwise. IAsyncConsumerDataProcessor.DrainDataAsync now returns Task<bool> so the bus can detect when a round produced data and another round is needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-05-16T13:18:40Z

Pushed an updated commit that merges main and addresses both review findings:

Race condition between channel completion and recreation
The processor no longer completes and recreates its channel during drain. Instead, DrainDataAsync enqueues a sentinel AsyncConsumerDataProcessorMessage (a drain marker carrying a TaskCompletionSource<bool>) and awaits it. Because the consumer is single-reader and the channel is FIFO, the marker fires only after every item that was already in the channel has been processed. The channel itself is never closed during a drain, so concurrent PublishAsync calls cannot hit ChannelClosedException/InvalidOperationException.
Lost publisher/consumer loop detection (and the now-failing DrainDataAsync_Loop_ShouldFail test)
Loop detection moved up to AsynchronousMessageBus.DrainDataAsync. Each processor now returns a bool from DrainDataAsync indicating whether at least one payload was processed (tracked via a _processedCount counter snapshotted around the marker). The bus drains the distinct processors (_consumerProcessor.Values rather than _dataTypeConsumers.Values, so consumers subscribed to multiple data types are drained only once per round) in a loop, bailing out as soon as a round processes nothing. If we still keep processing data after MaxDrainAttempts = 5 rounds, the bus throws InvalidOperationException with the existing "Publisher/Consumer loop detected during the drain after ..." message, which is what DrainDataAsync_Loop_ShouldFail asserts on. No more arbitrary Task.Delays.

Side effects:

IAsyncConsumerDataProcessor.DrainDataAsync() now returns Task<bool>.
The two AsyncConsumerDataProcessor implementations no longer mutate their channel/consume-task fields (both are readonly again).
Merge conflict from PR Refactor TestHostBuilder into focused partial modules #8201 (TestHostBuilder split) resolved by applying the IEnvironment parameter removal in TestHostBuilder.Framework.cs.

All 2223 Microsoft.Testing.Platform.UnitTests pass locally on net8.0, net9.0, and net48.

OneLocBuild commit 60e6270 removed the AreEquivalent translations from all FrameworkMessages.*.xlf files even though the corresponding entries were left in FrameworkMessages.resx (PR #8266). This puts every PR's CI build into an out-of-sync state. Regenerating with 'msbuild /t:UpdateXlf' restores the entries. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 22/22 changed files
Comments generated: 4

                }
-                finally
+
+                Interlocked.Increment(ref _processedCount);


                    }
-                    finally
+
+                    Interlocked.Increment(ref _processedCount);


-            totalPayloadReceived = Volatile.Read(ref _totalPayloadReceived);
+        try
+        {
+            await _channel.Writer.WriteAsync(AsyncConsumerDataProcessorMessage.CreateDrainMarker(drainMarker), _cancellationToken).ConfigureAwait(false);


+    // Maximum number of drain rounds before we consider that a publisher/consumer cycle exists
+    // and we throw to surface the bug rather than spin forever.
+    private const int MaxDrainAttempts = 5;


…rride, handle OCE during drain * AsyncConsumerDataProcessor now exposes ReceivedCount (incremented in PublishAsync). The bus uses it to detect publisher/consumer cycles across drain rounds instead of the per-processor _processedCount snapshot. The previous design under-counted in-flight consumers and could let the bus stop draining with newly produced data still pending, as flagged by Copilot review (AsyncConsumerDataProcessor.net.cs:63 and .netstandard.cs:53). * AsynchronousMessageBus restores the TESTINGPLATFORM_MESSAGEBUS_DRAINDATA_ATTEMPTS environment variable so that test hosts can raise the limit for longer but finite producer/consumer chains, matching the previous escape hatch. The constructor receives IEnvironment again; TestHostBuilder.Framework.cs and TestHostControllersTestHost.cs pass it through. * AsyncConsumerDataProcessor.DrainDataAsync now catches OperationCanceledException on the drain-marker write and returns gracefully, matching the previous behavior of bailing out of drain on cancellation (Copilot review on AsyncConsumerDataProcessor.net.cs:99). * IAsyncConsumerDataProcessor.DrainDataAsync goes back to returning Task (no longer Task<bool>) since loop detection is now driven by ReceivedCount. * Regenerate TrxReport ExtensionResources.*.xlf entries that were dropped by the OneLocBuild check-in on main, otherwise the build fails before our changes are even compiled. * Test constructor calls pass new SystemEnvironment() to the message bus. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-05-16T15:15:01Z

Pushed 25faf891 to address the second pass of Copilot review findings:

Counter incrementing before ConsumeAsync completes (net.cs:63, netstandard.cs:53)
Replaced the per-processor _processedCount snapshotted around the marker with a ReceivedCount counter incremented in PublishAsync. The bus now compares each processor's ReceivedCount between rounds, so any payload enqueued by an in-flight consumer (including ones that race with the drain marker) triggers another round — matching the loop-detection behavior of the old polling implementation.
OperationCanceledException on drain marker write (net.cs:99)
DrainDataAsync now catches OperationCanceledException (with the same CancellationToken guard as the consume loop) and returns gracefully, matching the previous "bail out on cancellation" semantic so shutdown/cleanup paths don't see unexpected aborts on NETCOREAPP.
Hard-coded drain limit (AsynchronousMessageBus.cs:16)
Restored the TESTINGPLATFORM_MESSAGEBUS_DRAINDATA_ATTEMPTS environment variable as an escape hatch. The constructor takes IEnvironment again; the default stays at 5 but can be raised for longer (still finite) producer/consumer chains. The threading of IEnvironment is restored in TestHostBuilder.Framework.cs, TestHostControllersTestHost.cs, and the test constructor calls.

Other notes:

IAsyncConsumerDataProcessor.DrainDataAsync() returns Task again now that the bus drives loop detection.
I also had to regenerate the FrameworkMessages.*.xlf files for AreEquivalent strings dropped by an earlier OneLocBuild check-in on main; without that fix every PR's CI build fails before our changes can compile (that fix is the prior commit, 26585e4b).

Local validation:

dotnet test for AsynchronousMessageBusTests is green on net8.0, net9.0, and net48 in both Debug and Release.
The DrainDataAsync_Loop_ShouldFail test passed 5 consecutive Release runs.

Evangelink · 2026-05-16T15:49:48Z

CI is now in good shape after the latest push:

Job	Result
Build Linux Debug	⚠️ Failure on pre-existing flaky `VSTest_DiscoverTests_Succeeds` / `VSTest_RunTests_Succeeds` integration tests (also fail on PR #8288, unrelated to this PR's MTP message-bus changes)
Build Linux Release	✅
Build MacOS Debug	✅
Build MacOS Release	✅
Build Windows Debug	✅
Build Windows Release	✅
Build WindowsSamples Debug	✅
Build WindowsSamples Release	✅

The PR is MERGEABLE, all three (then later four) Copilot review findings have been addressed, and the dedicated AsynchronousMessageBusTests (including DrainDataAsync_Loop_ShouldFail) pass on net8.0/net9.0/net48 in both Debug and Release locally and in CI.

Youssef1313 added 2 commits February 17, 2026 22:20

Cleanup MTP async data consumer

a74afe9

Merge branch 'main' into dev/ygerges/mtp-consumer

8aed0f9

Copilot AI review requested due to automatic review settings February 20, 2026 12:58

Copilot started reviewing on behalf of Youssef1313 February 20, 2026 12:59 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings May 16, 2026 13:31

Copilot started reviewing on behalf of Evangelink May 16, 2026 13:32 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup MTP async data consumer#7423

Cleanup MTP async data consumer#7423
Youssef1313 wants to merge 5 commits into
mainfrom
dev/ygerges/mtp-consumer

Youssef1313 commented Feb 17, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 20, 2026

Uh oh!

Copilot AI Feb 20, 2026

Uh oh!

Copilot AI Feb 20, 2026

Uh oh!

Evangelink commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Evangelink commented May 16, 2026

Uh oh!

Evangelink commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Youssef1313 commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Evangelink commented May 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Evangelink commented May 16, 2026

Uh oh!

Evangelink commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Youssef1313 commented Feb 17, 2026 •

edited

Loading